This tutorial is heavily based on the Appendices and Supporting Information found in the Open Access {aniMotum} paper:

  • Jonsen ID, Grecian WJ, Phillips L, Carrol G, McMahon C, Harcourt RG, Hindell MA & Patterson TA (2023) aniMotum, an R package for animal movement data: Rapid quality control, behavioural estimation and simulation. Methods in Ecology and Evolution 14(3): 806-816 https://doi.org/10.1111/2041-210X.14060

Please refer to the paper, appendices and the R package documentation for a comprehensive overview of {aniMotum}.


0.1 Before we begin

Please download the {aniMotum} package from R-Universe.

# install from my R-universe repository
install.packages("aniMotum", 
                 repos = c("https://cloud.r-project.org",
                 "https://ianjonsen.r-universe.dev"),
                 dependencies = TRUE)

This installs all Imported and Suggested R packages from CRAN and R-universe. If queried, answer Yes to install the source version. Note, if you haven’t installed {aniMotum} previously then installation of dependent packages may take a while, especially if many of them need to be compiled. You should only need to go through this once, subsequent installation of {aniMotum} updates will be much faster.

Alternatively you can download a binary version of aniMotum here: https://ianjonsen.r-universe.dev/aniMotum

For full instructions see the aniMotum homepage on GitHub: https://github.com/ianjonsen/aniMotum


1 Introduction

The aim of this practical is to give you an understanding of how we can use the {aniMotum} R package to process and analyse animal movement data. My aim is that you can go on to apply similar workflows to your own animal movement data.

During this practical we will:

  1. Load in an example data set
  2. Fit a state-space model to regularise and error correct a movement path
  3. Visualise the model fit
  4. Fit a move persistence model to estimate behaviour
  5. Extract the regularised data
  6. Manual mapping
  7. Simulate from the model
  8. Re-route movement paths

This practical will be based on the {tidyverse} style of R coding. You will become familiar with using pipes |> to perform data manipulations, using ggplot to generate publication quality figures, and using the {sf} package to handle spatial data.

For more information on the {tidyverse} check out the Wickham & Grolemund book ‘R for Data Science’. You can access it for free online here:
https://r4ds.had.co.nz

The project website can be accessed here:
https://www.tidyverse.org

For more information on the {sf} project check out https://r-spatial.github.io/sf/

As discussed in the presentation we can pass data from several different tag types to {aniMotum}. For an outline of how {aniMotum} expects this data to be formatted see: https://ianjonsen.github.io/aniMotum/articles/Overview.html


2 Practical

In March 2019 I was part of an expedition to deploy Satellite Relay Data Loggers on harp seal pups in the Gulf of St Lawrence, Canada. These devices are satellite linked computers capable of recording a wide range of information and transmitting it via the ARGOS network. Here we will focus on the location data recorded by the ARGOS satellites when communicating with three of the tags.

Harp seals give birth on the sea ice when it is at it’s most southerly extent in late Winter/ early Spring. Females wean their pup after around 2 weeks. We deployed tags on pups that were approximately 3 weeks old, and tracked their movements as they left the sea ice and began their first migration north.

This work was published in Royal Society Open Science and is available Open Access here:

  • Grecian WJ, Stenson GB, Biuw M, Boehme L, Folkow LP, Goulet PJ, Jonsen ID, Malde A, Nordøy ES, Rosing-Asvid A & Smout S (2022) Environmental drivers of population-level variation in the migratory and diving ontogeny of an Arctic top predator Royal Society Open Science 9: 211042 https://doi.org/10.1098/rsos.211042

2.1 Loading data

Data for 3 animals are available in the repo. Lets load them into R and have a quick look.

# load libraries
library(aniMotum)
library(patchwork)
library(sf)
library(tidyverse)

# load harp seal locations
locs <- read_csv("harps.csv")
locs
## # A tibble: 10,884 × 8
##    id         date                lc      lon   lat  smaj  smin   eor
##    <chr>      <dttm>              <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 hp6-747-19 2019-03-23 00:08:33 3     -59.6  48.1   269    93    97
##  2 hp6-747-19 2019-03-23 00:25:33 3     -59.6  48.1   338    75    69
##  3 hp6-747-19 2019-03-23 00:37:38 1     -59.6  48.1  4984   144    83
##  4 hp6-747-19 2019-03-23 10:20:14 3     -59.7  48.2   358    89   116
##  5 hp6-747-19 2019-03-23 10:22:18 3     -59.7  48.2   239    82    96
##  6 hp6-747-19 2019-03-23 15:07:47 2     -59.7  48.2  1345    76   132
##  7 hp6-747-19 2019-03-23 15:36:07 3     -59.7  48.2   665    48    90
##  8 hp6-747-19 2019-03-23 20:03:42 B     -59.7  48.2  4369   528    42
##  9 hp6-747-19 2019-03-23 20:13:27 2     -59.7  48.3  1890    72    84
## 10 hp6-747-19 2019-03-24 01:25:02 3     -59.7  48.3   271    70    82
## # ℹ 10,874 more rows

The harp data are formatted in a pretty standard way, including Argos’ Least-Squares-based location error classes lc. This is the minimum required by fit_ssm:
- id a unique identifier for each animal (or track) in the data.frame or tibble.
- date a date-time variable in the form YYYY-mm-dd HH:MM:SS (or YYYY/mm/dd HH:MM:SS). These can be text strings, in which case aniMotum converts them to POSIX format and assumes the timezone is UTC.
- lc the location class variable common to Argos data, with classes in the set: 3, 2, 1, 0, A, B, Z.
- lon the longitude variable.
- lat the latitude variable.

The lc values determine the measurement error variances (based on independent data, see Jonsen et al. 2020) used in the SSM’s for each observation.

Since 2011, the default Argos location data uses CLS Argos’ Kalman Filter (KF) algorithm. These data include error ellipse information for each observed location in the form of 3 variables: ellipse semi-major axis length, ellipse semi-minor axis length, and ellipse orientation.

The column names follow those for Argos LS data, with the following additions:
- smaj the Argos error ellipse semi-major axis length (m).
- smin the Argos error ellipse semi-minor axis length (m).
- eor the Argos error ellipse ellipse orientation (degrees from N).

Here, the error ellipse parameters for each observation define the measurement error variances used in the SSM’s (Jonsen et al. 2020). Missing error ellipse values are allowed, in which case, those observations are treated as Argos LS data.

# preliminary plot
p1 <- ggplot() +
  theme_bw() +
  geom_point(aes(x = lon, y = lat), data = locs) +
  facet_wrap(~id)
print(p1)

The plot reveals one tracks have large gaps while the others have obviously erroneous location estimates.

Let’s check how frequently the tags were transmitting. This can guide the frequency of regularised locations estimated by {aniMotum}.

locs |>
  group_by(id) |>
  summarise(mean_interval_hrs = as.numeric(mean(difftime(date, lag(date)), na.rm = T), "hours"),
            max_interval_hrs = as.numeric(max(difftime(date, lag(date)), na.rm = T), "hours"))
## # A tibble: 3 × 3
##   id         mean_interval_hrs max_interval_hrs
##   <chr>                  <dbl>            <dbl>
## 1 hp6-747-19             2.97             303. 
## 2 hp6-749-19             0.936             15.0
## 3 hp6-756-19             1.17              54.3

On average two of the tags were transmitting hourly, with the first tag only transmitting every 3 hours. This difference is probably driven by the large gaps in transmission: 303 hours is 12.5 days!

2.2 Fit a state-space model

We can use {aniMotum} to regularise and error correct these movement paths using a state-space model. For speed/ simplicity we’ll ask for daily locations, but you can easily adjust this depending on your questions. For example, in the paper I dropped the data with large gaps and then used a 6 hour interval to match the locations to the 6 hour dive summary data transmitted by the tags.

We can fit to all animals in the same call using the fit_ssm function. Here I’m fitting a correlated random walk with model = "crw" other options are random walk (model = "rw") and move persistence (model = "mp").

fit <- fit_ssm(locs,
        vmax = 5,
        model = "crw",
        time.step = 24,
        control = ssm_control(verbose = 0))

You can access the results and summary information about the model fit with the summary function:

summary(fit)
##   Animal id Model Time n.obs n.filt n.fit n.pred n.rr converged    AICc
##  hp6-747-19   crw   24  1971    583  1388    245    .      TRUE 22638.1
##  hp6-749-19   crw   24  5073   1339  3734    199    .      TRUE 49318.5
##  hp6-756-19   crw   24  3840   1027  2813    189    .      TRUE 42544.1
## 
## --------------
## hp6-747-19 
## --------------
##  Parameter Estimate Std.Err
##        D_x   1.8448   0.176
##        D_y   4.5803  0.5023
##      rho_p  -0.1793  0.0848
##        psi   8.9861  0.3696
## 
## --------------
## hp6-749-19 
## --------------
##  Parameter Estimate Std.Err
##        D_x   3.3592  0.1955
##        D_y   5.3051  0.3263
##      rho_p   0.0706  0.0491
##        psi   7.6369  0.1835
## 
## --------------
## hp6-756-19 
## --------------
##  Parameter Estimate Std.Err
##        D_x   5.5716  0.3673
##        D_y  23.8166  1.5672
##      rho_p  -0.1603    0.07
##        psi  14.8114   0.438

2.3 Visualise the model fit

We can quickly check the model fit via a 1-D time series plot of the model:

plot(fit,
     what = "predicted",
     type = 1,
     pages = 1)

I’ve pulled just the first animal to illustrate the plot here. The fitted points (orange) are overlayed on top of the observations (blue). Note when plotting the predicted (24 hour) locations the uncertainty increases when interpolating across the gaps we highlighted earlier. You would need to carefully consider whether to include these interpolated sections in any further analysis.

A 2-D plot of the model:

plot(fit,
     what = "predicted",
     type = 2,
     pages = 1,
     ncol = 3)

Again highlighting how much uncertainty we have in hp6-747-19.

We can map the predicted locations straight from the fit_ssm object as follows:

aniMotum::map(fit,
              what = "predicted")
## using map scale: 10

Often when validating models we want to assess residual plots. {aniMotum} offers the option of calculating one-step-ahead prediction residuals. These can be helpful but are computationally demanding. We calculate the residuals via the osar function and can visualise them as a time-series, QQ plot and autocorrelation plot.

res <- osar(fit)
plot(res, type = "ts", pages = 1) | plot(res, type = "qq", pages = 1) | plot(res, type = "acf", pages = 1)

The time series and autocorrelation plots suggest that the crw model is a good fit to the data, although there is some skew in the QQ plot. We could re-run the models as rw or mp and see how this impacts model fit.

2.4 Fit a move persistence model

We may be interested in inferring how an individuals behaviour changes along the movement path. Move persistence is a continuous behavioural index that captures autocorrelation in both speed and direction. For more information on move persistence please read:

  • Jonsen, ID, McMahon, CR, Patterson TA, Auger-Méthé M, Harcourt R, Hindell MA & Bestley S (2019). Movement responses to environment: Fast inference of variation among southern elephant seals with a mixed effects model. Ecology 100: e02566 (https://doi.org/10.1002/ecy.2566)

This is a different approach to discrete states estimated using methods such as hidden Markov models. For information on those see Théo Michelot’s previous EFI/ ESA webinar here: https://www.youtube.com/watch?v=WELTpbB5BuU

fmpm <- fit_mpm(fit,
                what = "predicted",
                model = "jmpm",
                control = mpm_control(verbose = 0))
plot(fmpm,
     pages = 1,
     ncol = 2)

It is possible to go on to link move persistence to environmental covariates. Take a look at Jonsen et al. (2019) for the method and Grecian et al. (2022) for application to this data set.

2.5 Data extraction

You may want to extract the regularised data output from fit_ssm to use in a difference application, or you may want to generate your own maps. You can do this with the grab function:

# extract predicted locations and their move persistence estimates
df_locs <- fit |> grab("predicted")
df_mpm <- fmpm |> grab("fitted")

# combine
df <- df_locs |> left_join(df_mpm)
## Joining with `by = join_by(id, date)`
df
## # A tibble: 633 × 17
##    id     date                  lon   lat      x     y    x.se    y.se         u
##    <chr>  <dttm>              <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>     <dbl>
##  1 hp6-7… 2019-03-23 00:00:00 -59.6  48.1 -6634. 6088. 1.00e-5 1.00e-5 -6.20e-11
##  2 hp6-7… 2019-03-24 00:00:00 -59.7  48.3 -6642. 6123. 2.28e+0 3.56e+0  8.99e- 2
##  3 hp6-7… 2019-03-25 00:00:00 -59.6  48.2 -6629. 6115. 3.13e+0 4.88e+0  1.21e+ 0
##  4 hp6-7… 2019-03-26 00:00:00 -59.5  48.2 -6621. 6108. 3.33e+0 5.17e+0  2.83e- 1
##  5 hp6-7… 2019-03-27 00:00:00 -59.5  48.2 -6620. 6110. 6.60e-1 8.10e-1 -1.02e- 1
##  6 hp6-7… 2019-03-28 00:00:00 -59.5  48.2 -6620. 6109. 3.37e+0 5.26e+0 -1.98e- 1
##  7 hp6-7… 2019-03-29 00:00:00 -59.4  48.3 -6614. 6133. 1.21e+1 6.52e+0  7.02e- 1
##  8 hp6-7… 2019-03-30 00:00:00 -59.3  48.6 -6597. 6177. 2.96e+0 3.89e+0  7.61e- 1
##  9 hp6-7… 2019-03-31 00:00:00 -59.2  48.6 -6591. 6177. 2.28e+0 3.39e+0  2.74e- 2
## 10 hp6-7… 2019-04-01 00:00:00 -59.1  48.7 -6581. 6192. 8.26e-1 1.03e+0  9.55e- 1
## # ℹ 623 more rows
## # ℹ 8 more variables: v <dbl>, u.se <dbl>, v.se <dbl>, s <dbl>, s.se <lgl>,
## #   logit_g <dbl>, logit_g.se <dbl>, g <dbl>

2.6 Manual mapping

To generate a map we need to load a suitable shapefile from the {rnaturalearth} package and define a suitable projection. I’ve then created an {sf} object with a WGS84 projection, and passed that to the geom_sf function. We specify the projection via the coord_sf function.

#install.packages("rnaturalearth")
require(rnaturalearth)
## Loading required package: rnaturalearth
# Generate a global shapefile and a simple plot
world <- ne_countries(scale = "medium", returnclass = "sf")

# To generate a plot with less distortion first define a projection i.e. Lambert Azimuthal Equal Area
prj = "+proj=laea +lat_0=60 +lon_0=-50 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"

# Create an sf version of the locs data with a WGS84 projection and add to the plot
df_sf <- df |> 
  st_as_sf(coords = c('lon', 'lat')) |>
  st_set_crs(4326)

ggplot() +
  theme_bw() +
  geom_sf(aes(), data = world) +
  geom_sf(aes(colour = g), data = df_sf, show.legend = "point") +
  scale_color_viridis_c(expression(gamma[t]), limits = c(0,1)) +
  coord_sf(xlim = c(-2000000, 2000000), ylim = c(-2500000, 2500000), crs = prj, expand = T) +
  scale_x_continuous(breaks = seq(from = -130, to = 30, by = 10)) +
  facet_wrap(~id, nrow = 1)

This nicely illustrates the most likely path of the three animals from the pupping ground in the Gulf of St Lawrence as they migrate north. Colouring the points by move persistence (gamma[t]) highlights periods when the animals were moving faster and more directed (lighter colours) and when the animals were travelling slower and less directed (darker).

Hey there - nice plot!
Hey there - nice plot!

3 Extras

3.1 Simulate from the model

trs <- sim_fit(fit,
               what = "predicted",
               reps = 5)
plot(trs,
     ncol = 3)

plot(fit)

#p2 <- plot(fit, type = 1, what = “predicted”) #p3 <- plot(fit, type = 2, what = “predicted”) #p4 <- aniMotum::map(fit, what = “predicted”)

#fmpm <- fit_mpm(fit)

#xs <- osar(fit) #plot(res, type = “ts”) #plot(res, type = “qq”) #plot(res, type = “acf”)

```

aniMotum can be used to fit a state-space model to Argos PTT, GLS or GPS data.

Have a look at the help file for fit_ssm.

For Argos PTT data you need to provide either the location class, or the semimajor and semiminor axies of the error…

For GLS data replace the lc column with a string of “GL”

For GPS data replace the lc column with a string of “G”

ideally for geolocation data youwould have standard errors associated with each location, which can be passed as columns to XXX.

If you do not have these data then a good starting place is somewhere between 0.25 and 1 deg SE’s for Latitude and .5 that value for Longitude.

For GPS data you can try a small error variance i.e. 10s of metres

Rework the list of things to do so that they match the introductory slides

Prep data Run fit_ssm

etc

What about other sources of data? Geolocation GPS PTT